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1  Objective 

GRASP  Laboratory  research  combines  Active  Perception  and  Robotics  to 
vices  capable  of  performing  sophisticated  tasks.  This  research  specifically  concentrates  on  multh 
spectral  image  processing,  3-D  shape  identification,  decision  making  and  robot  actuation.  Percep¬ 
tion  via  manipulation  is  combined  with  information  obtained  from  a  variety  of  sensors  to  establish 
one  or  more  features  or  properties  of  an  unstructured  environment.  This  links  exploration  of  an 
unknown  environment  by  visual  sensing,  range  measurement,  manipulation  and  physical  probing. 
It  is  a  direct  application  of  our  theoretical  work  in  robust  multisensor  fusion  and  techniques  for 
integrating  data  from  multiple  modalities. 

One  of  the  primary  objectives  of  this  research  is  to  investigate  coordination  and  communication 
of  multi-agent  systems.  In  particular,  multiple  agents  explore  and  adapt  to  their  surroundings 
and  organize  and  configure  themselves  to  perform  required  tasks  with  possible  assistance  of  human 
agents. 
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2  Approach 


Our  approach  is  based  on  an  “advice-based  smaU-team”  architecture.  The  agents  are  heterogeneous 
in  both  their  scope  of  applicability  or  functionality  and  their  capabilities  or  competence.  Their  con¬ 
nectivity  depends  on  their  shared  versus  independent  domain  of  applicability  and/or  task/subtasks. 
An  example  of  a  shared  domain  is  an  obstacle  monitored  by  vision  and  acoustic  sensors.  These 
two  agents  perform  redundant  or  complimentary  functions.  On  the  other  hand,  an  example  of 
independent  agents  would  be  the  force  sensors  that  monitor  the  contact  and  sliding/rolling  of  an 
object  held  by  two  palms,  while  the  acoustic  sensors  monitor  the  vehicles  to  avoid  obstacles.  The 
advice-based  smaU-team  architecture  is  new  in  that  it  provides  as  much  autonomy  as  possible  to 
individual  agents,  yet  it  makes  all  the  possible  information  accessible  to  other  agents  in  the  spirit 
of  cooperation.  AU  the  agents  know  the  common  task  of  transporting  an  object  from  place  A  to 
place  B.  Since  aU  the  agents  are  physical,  the  real  time  issue  becomes  apparent! 
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3  Progress 

Progress  in  the  last  year  of  the  project  has  been  made  in  the  areas  of  control,  the  observer  agent, 
and  multisensor  fusion. 


3.1  Control 

The  control  of  individual  mobile  manipulators  has  been  investigated.  The  control  of  a  mobile  ma¬ 
nipulator  involves  the  coordination  of  locomotion  of  the  mobile  platform  and  manipulation  of  the 
manipulator.  The  coordination  of  locomotion  and  manipulation  is  important  for  a  number  of  rea¬ 
sons  including  redundancy  in  mobility,  difference  in  dynamic  response,  nonholonomic  constraints, 
and  dynamic  interactions.  Modeling  the  mobile  platform  as  a  nonholonomic  dynamic  system,  we 
have  developed  and  experimentally  tested  a  control  algorithm  for  coordinating  locomotion  and  ma¬ 
nipulation.  Using  this  algorithm,  while  the  manipulator  is  dragged  by  an  operator  in  any  direction 
in  a  horizontal  plane,  the  mobile  platform  is  able  to  bring  the  manipulator  into  the  configuration 
with  maximum  manipulability  measure. 

In  further  research  we  will  integrate  the  wrist  force/torque  sensor  in  the  control  algorithm,  thus 
enabling  the  mobile  manipulator  to  maintain  contact  with  and  follow  a  moving  surface  rather  than 
being  dragged. 


Accesion  For 


3.2  Observer  Agent 
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The  function  of  the  Observer  Agent  is  to  recognize  the  environment  (in  real  time),  in  particular  the 
free  path  and  the  obstacles.  While  there  are  numerous  algorithms  in  the  literature  describing  how  to 

extract  optical  flow,  range  and  motion  parameters,  most  of  them  are  too  complex  to  run  in  real  time  . 

(15  frames  per  second)  and  not  robust  enough.  For  the  real-time  processing  we  have  concentrated  . 

on  proper  data  reduction  mechanisms  (data  selection)  via  the  use  of  different  optics,  and  model- 
based  tracking.  For  the  robustness  question,  we  have  concentrated  on  removing  the  highlights 

and  shadows  using  color  and  active  light.  These  algorithms  are  based  on  point  transformations  ^ _ _ 

(difference  of  two  images),  and  hence  are  highly  parallel. 

3.3  Multisensor  Fusion 

Our  multi-agent  system  employs  the  following  sensors:  multiple  cameras  which  simultaneously 
provide  images  from  multiple  fields  of  view  and  varying  depths  of  field;  digital  compasses  on  the 
mobile  agents;  odometry  from  the  wheels  of  the  mobile  agents;  acoustic  range  sensors  on  the  mobile 
agents;  infrared  proximity  sensors  on  the  mobile  agents;  and  force  and  torque  sensors  on  the  end- 
effectors  of  manipulator  agents.  These  sensors  provide  information  of  different  types  and  qualities. 

One  research  issue  is  delineation  of  decision  models  for  combining  or  fusing  information  with  a 
common  type  with  differing  qualities,  as  weU  as  the  fusion  of  information  of  different  types  with 
varying  quality.  We  have  developed  a  mathematical  model  for  fusing  information  of  a  common  type 
where  one  source  provides  coarse-grained  information  with  good  reliability  and  the  other  source 
provides  fine-grained  information  but  may  be  subject  to  serious  sporadic  errors.  For  example,  we 
can  use  optics  or  infrared  technology  for  coarse  range  determination.  Within  specified  domains  of 
operation,  these  coarse  range  estimates  will  be  reliable;  we  can  use  acoustic  range  sensing  for  fine¬ 
grained  range  information  —  with  the  caveat  that  the  acoustic  range  information  may  be  seriously 
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in  error  due  to  multipath  or  insufficient  target  cross-section.  These  models  and  techniques  do  not 
rely  on  either  highly  refined  sensor  noise  models  or  highly  accurate  sensor  position  information. 
Both  of  these  additional  sources  of  errors  are  accounted  for  in  this  methodology. 


4  Accomplishments 

Accomplishments  in  the  last  year  of  the  project  include: 

•  Recognition  of  highlights  for  dialectric  materials  and  metallic  materials 

•  Recognition  of  shadows  using  active  light. 

•  Simultaneous  real-time  (15  frames  per  second)  model-based  2D  tracking  of  multiple  objects. 

•  Near-optimal  robust  fixed-size  confidence  procedures. 

•  Robustness  with  respect  to  noise  distribution  uncertainty,  applicable  to  essentially  any  class 
of  noise  distributions  which  have  smooth  (non-atomic)  boundaries. 

•  Near-optimal  performance  obtained  using  easily  computed  non-monotone  functions. 


5  Technical  Reports 
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University  of  Pennsylvania,  MS-CIS-91-87,  GRASP  LAB  284. 

3.  Eric  Paljug,  Tom  Sugar,  Vijay  Kumar  and  Xiaoping  Yun.  Important  Considerations  In  Force 
Control  With  Applications  To  Multi-Arm  Manipulation.  Technical  Report,  Department  of 
Computer  and  Information  Science,  University  of  Pennsylvania,  MS-CIS-91-88,  GRASP  LAB 
287. 

4.  Sanjay  Agrawal.  Robotic  Manipulation  Using  A  Behavioral  Framework.  Technical  Report 
(Dissertation),  Department  of  Computer  and  Information  Science,  University  of  Pennsylvania, 
MS-CIS-91-90,  GRASP  LAB  287. 

5.  Ruzena  Bajcsy  and  Mario  Campos.  Active  and  Exploratory  Perception.  Technical  Report, 
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6.  Ruzena  Bajcsy.  An  Active  Observer.  Technical  Report,  Department  of  Computer  and  Infor¬ 
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3 


8.  Gareth  D.  Funka-Lea.  Vision  For  Navigation  Using  Two  Road  Cues.  Technical  Report, 
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1  Abstract 

In  this  paper  we  present  a  framework  for  research  into 
the  development  of  an  Active  Observer.  The  com¬ 
ponents  of  such  an  observer  axe  the  low  and  intermedi¬ 
ate  visual  processing  modules.  Some  of  these  modules 
have  been  adapted  from  the  community  and  some  have 
been  investigated  in  the  GRASP  laboratory,  most  no¬ 
tably  modules  for  the  understanding  of  surface  reflec¬ 
tions  via  color  and  multiple  views  and  for  the  segmen¬ 
tation  of  three  dimensional  images  into  first  or  second 
order  surfaces  via  superquadric/parametric  volumetric 
models.  However  the  key  problem  in  Active  Observer 
research  is  the  control  structure  of  its  behavior  based 
on  the  task  and  the  situation.  This  control  structure  is 
modeled  by  a  formalism  called  Discrete  Events  Dynamic 
Systems  (DEDS). 

2  Introduction 

We  are  interested  in  the  development  of  an  Active  Ob¬ 
server.  An  Active  Observer  is  an  agent  which  has  capa¬ 
bilities  to  observe  scenes,  objects,  situations  and  deliver 
the  observed  information  to  human,  manipulatory,  and 
mobile  agents.  Naturally  there  are  more  questions  than 
answers.  We  shall  list  a  few  which  are  of  particular  in¬ 
terest  to  us.  What  are  the  components/modules  that 
such  an  observer  must  have?  How  are  these  components 
interconnected,  i.e.  what  is  the  architecture  of  such  an 
agent?  Some  of  the  modules  correspond  to  certain  vi¬ 
sual  cues.  We  take  as  a  given  that  our  observer  has 
several  such  cues.  In  that  case,  the  subsequent  ques¬ 
tion  is  how  are  the  results  from  these  cues  integrated? 
When  are  they  invoked?  How  is  the  selection  process 
conducted/guided?  Which  cue  is  employed  and  when? 
Finally,  what  kind  of  information/messages  is  delivered 
by  the  observer  to  other  agents? 

Towards  this  end,  for  the  last  two  years  we  have  con¬ 
centrated  on  the  development  of  theoretical  and  experi¬ 
mental  understanding  of  some  of  the  cues/components, 
some  cues’  integration  and  selection,  and  control  strate¬ 
gies  for  observation  capability.  In  particular,  in  cue  de¬ 
velopment  we  have  tried  to  understand  surface  reflec- 
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tions  by  color  and  multiple  views.  An  important  finding 
of  this  work,  which  will  be  described  in  detail  in  Section 
2,  is  that  multiple  view  points  provide  useful  information 
for  discriminating  between  specular  and  Lambertian  re¬ 
flections  both  from  dielectrics  and  from  metals.  In  Sec¬ 
tion  3,  we  shall  describe  a  system  for  the  segmentation 
of  a  three  dimensioned  scene  into  components  that  can 
be  modeled  by  superquadric  parametric  fit.  This  system 
uses,  in  cooperation,  surface  segmentation,  contour  seg¬ 
mentation  and  gross  volumetric  segmentation  in  order  to 
arrive  at  the  proper  result.  The  scenes  are  of  moderate 
complexity  (up  to  10  parts),  but  no  other  assumptions 
are  made  about  objects  or  their  parts.  This  work  points 
to  the  common  fact  that  one  module  or  cue  or  approach 
cannot  handle  the  perceptual  variety  of  the  data  that 
the  real  world,  even  in  moderate  complexity,  represents. 
Multiple  cues  zire  necessary  and  hence  a  great  deal  of 
thought  has  to  go  into  the  integration  policy  and  control 
structure.  In  Section  4,  we  present  a  formal  model  of 
an  observer  agent.  This  model  is  based  on  the  theory  of 
Discrete  Event  Dynamic  Systems  (DEDS),  which  allows 
us  to  unequivocally  predict  the  observation  capabilities 
of  an  observer.  In  order  for  this  to  occur,  the  observer 
must  know  the  discrete  events  of  the  task.  So  far  this 
is  done  by  the  designer.  Finally,  in  Section  5  we  show 
the  recent  development  of  a  CCD  chip  (the  Retina)  with 
space  variant  resolution.  Details  are  described  in  this 
section. 

3  Understanding  of  Reflection 

Properties  Using  Color  and  Multiple 
Views 

Recently  there  has  been  a  growing  interest  in  the  detec¬ 
tion  of  specularity  in  both  basic  and  applied  computer 
vision  research.  In  general,  the  detection  of  speculari- 
ties  from  a  single  gray-level  image  is  a  physically  under¬ 
constrained  problem,  and  more  information  needs  to  be 
collected  in  physically  sensible  ways  to  solve  the  prob¬ 
lem.  Successful  development  of  an  algorithm  for  image 
data  collection  and  interpretation  necessarily  depends 
on  physical  models  that  describe  how  surfaces  appear 
according  to  the  illumination  and  reflectance  properties 
and  sensor  characteristics.  Recently  the  computer  vi¬ 
sion  field  has  increasingly  incorporated  methodologies 
derived  from  physical  principles  of  image  formation  and 
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sousing  [7].  So  far  thoro  havo  been  three  types  of  ap¬ 
proaches  to  solving  the  problem  of  specularity  detection 
through  the  collection  of  more  images:  (1)  with  differ¬ 
ent  light  directions.  (2)  with  different  sensor  polarization 
angles,  and  (3)  with  different  color  sensors. 

The  pilot omctric-stereo-type  approaches  consider  the 
specular  and  Lambertian  reflectance  properties  for  ob¬ 
taining  object  shape  using  more  than  two  light  directions 
[A]  [9]  [11].  Since  the  direction  and  the  degree  of  the 
collimation  of  the  illumination  need  to  be  strictly  con¬ 
trolled,  application  of  the  approach  is  restricted  to  dark¬ 
room  environments.  The  polarization  method  analyzes 
the  polarization  of  reflected  light  and  detects  specular- 
ities  from  dielectrics  and  metals  [12].  The  polarization 
approach  places  some  restrictions  on  the  incident  illumi¬ 
nation  direction  with  respect  to  surface  orientation. 

The  dichromatic  model  [10]  proposed  by  Shafer  has 
been  the  key  model  to  the  recent  specularity  detection 
algorithms  using  color  [8]  [5]  [6]  [3].  The  basic  limita¬ 
tion  of  the  color  algorithms  is  that  objects  must  be  only 
colored  dielectrics  to  use  the  dichromatic  model.  For 
color  image  segmentation,  it  is  usually  assumed  that  ob¬ 
ject  surface  reflectance  is  spatially  piecewise  uniform  in 
color  and  that  scene  illumination  is  singly  colored.  We 
have  previously  developed  a  color  image  segmentation 
algorithm  for  the  separation  of  diffuse^  as  well  as  sharp, 
specularities  and  inter-reflections  from  Lambertian  re¬ 
flections  [3]. 

Our  recent  rese2irch  has  focused  on  the  development 
of  some  specularity  detection  or  separation  methods  that 
only  require  modification  of  sensors  but  not  any  modifi¬ 
cation  of  environments.  In  other  words,  they  are  meth¬ 
ods  that  are  active  in  modifying  sensors  but  passive  in 
modifying  environments.  There  are  two  kinds  of  modifi¬ 
cation  of  environments:  relocation  and  re-orientation  of 
objects  by  robot  manipulation,  and  illumination  change. 
The  prime  example  of  the  illumination  change  is  the 
light  switching  for  the  photometric-stereo-type  meth¬ 
ods.  Since  illumination  lighting  needs  to  be  strictly  con¬ 
trolled,  the  photometric-stereo-type  approaches  are  ap¬ 
plicable  only  for  inspection  in  dark  rooms. 

Strict  illumination  control  is  not  always  possible  in 
investigating  surface  reflection  properties  in  many  gen¬ 
eral  environments.  Examples  include  outdoor  inspec¬ 
tion,  indoor  or  outdoor  navigation,  and  exploratory  en¬ 
vironments.  Even  for  indoor  inspection,  a  well  controlled 
dark  room  is  not  always  available. 

For  general  environments  without  strict  illumination 
control,  only  sensors  are  controllable,  and  color  and  po¬ 
larization  can  be  the  possible  cues.  Another  possibility 
is  to  move  the  observer,  which  has  not  been  used  for  in¬ 
vestigating  reflection  properties  in  computer  vision.  The 
idea  of  moving  the  observer  was  directly  motivated  by 
the  concept  of  active  vision  [2].  For  low-level  vision  prob¬ 
lems  of  shape  or  structure,  it  has  been  demonstrated  that 
many  ill-posed  problems  become  well-posed  if  more  in¬ 
formation  is  collected  by  active  sensors  [1].  Although 
the  paradigms  for  shape  or  structure  based  on  feature 
correspondence  cannot  be  directly  applied  to  the  study 
of  reflectance  properties,  the  idea  of  a  moving  observer 
motivated  the  investigation  of  new  principles  by  physical 


modeling  in  obtaining  more  information. 

In  this  paper,  we  suggest  the  use  of  multiple  \ic\\^ 
for  the  detection  of  specularity  by  introducing  two  algcn 
rithms.  The  first  algorithm,  called  spectral  differencing, 
uses  color  information  from  a  small  number  of  multiplr 
views.  Tlie  second  algorithm  is  called  view  sampling. 
Using  many  views  of  gray-level  images  collected  in  widr 
angle,  the  view  sampling  reconstructs  object  structur<' 
and  detects  specularities.  An  important  principle  us(?d 
for  the  algorithms  is  the  Lambertian  consistency,  which 
is  the  well-known  fact  that  the  Lambertian  reflection 
does  not  change  its  brightness  and  spectral  content  de  ¬ 
pending  on  viewing  directions,  but  the  specular  reflec¬ 
tion  or  the  mixture  of  Lambertian  and  specular  refloe - 
tions  can  change. 

A  problem  associated  with  the  use  of  multiple  views 
with  color  is  what  kind  of  extra  spectral  information  can 
be  obtained  by  moving  a  color  camera  without  consid¬ 
ering  object  geometry.  If  there  is  any,  it  may  alleviate^ 
the  limiting  assumptions  imposed  on  the  object  and  illu¬ 
mination  domain  for  the  color  segmentation  approaches, 
and  provide  higher  confidence  in  detecting  specularities. 

The  spectral  differencing  algorithm  is  based  on  th(‘ 
observation  that  any  presence  of  specular  reflections  can 
be  inferred  by  the  difference  in  the  distribution  of  pixel 
colors  between  two  color  images.  According  to  the  Lam¬ 
bertian  consistency,  the  color  distribution  of  pixels  from 
only  Lambertian  reflections  should  be  consistent  regar 
less  of  view  points.  On  the  other  hand,  specularities 
or  the  mixture  of  specular  and  Lambertian  reflections 
can  change  the  distribution  of  pixel  colors  between  two 
views. 

The  spectral  differencing  algorithm  does  not  require 
any  assistance  from  image  segmentation  and  geornetri- 
cal  manipulation.  Since  the  algorithm  does  not  rely  on 
the  segmentation  and  the  dichromatic  model,  it  is  appli¬ 
cable  to  dielectric  objects  with  nonuniform  reflectance 
and  metals  under  multiply  colored  illumination.  Fig¬ 
ures  1  and  2  show  two  dielectric  objects  with  varia¬ 
tion  in  reflectance  and  a  metallic  object  in  neutral  re 
flectance  color.  Two  fluorescent  light  tubes  and  a  tung¬ 
sten  light  bulb  are  used  for  illumination  and  there  are 
inter-reflections  from  the  walls.  MSD(0  1)  shows  the 
regions  of  new  color  distribution  in  view  0  compared  to 
view  1,  and  MSD(1  ^  0)  the  regions  of  new  color  dis¬ 
tribution  in  view  1  compared  to  view  0.  Under  multipiv 
colored  and  extended  illumination,  it  can  be  seen  that 
most  of  the  specularities  are  detected  by  the  spectral 
differencing. 

Another  approach  we  introduce  is  to  obtain  reflection 
properties  using  only  multiple  views  without  any  color 
information.  With  densely  sampled  views  in  wid-  an¬ 
gle  and  with  known  viewing  directions,  the  view  sam¬ 
pling  algorithm  reconstructs  object  sti  act uie  as  wdl  as 
detects  specularities  from  Lambertian  reflections.  Ihe 
view  sampling  algorithm  is  applicable  to  dielectrics  and 
metals. 

If  object  structure  is  reconstructed  assuming  the  Lam¬ 
bertian  consistency  for  both  Lambertian  and  specular  re¬ 
flections,  the  structure  reconstructed  from  the  specular 
reflections  would  not  in  general  represent  the  real  object 


Figure  1:  Spectral  differencing 


Figure  2:  Spectral  differencing 


surface,  while  the  one  reconstructed  from  the  Lamber¬ 
tian  reflections  does.  By  examining  the  differently  recon¬ 
structed  object  structures  from  specular  and  Lambertian 
reflections,  we  can  identify  the  reflection  types  and  the 
real  object  structure. 

We  adopted  an  algorithm  for  computerized  tomogra¬ 
phy  through  photometric  modeling  for  the  reconstruc¬ 
tion  of  object  structure.  Figure  3  shows  the  camera  con¬ 
trol  scheme  and  Figure  1  (a)  shows  4  out  of  30  view  sam¬ 
ples  of  a  gray  dielectric  object  from  different  view  points. 
Figure  4  (c)  and  (d)  show  the  reconstructed  structures 
at  the  cross  sections  1  and  2  illustrated  in  Figure  4  (b), 
n'spectively.  As  shown  in  Figure  4  (c)  and  (d),  the  struc¬ 
ture  reconst  ructed  from  sp('cularities  at  the  cross  section 
2  is  different  from  the  real  object  surface  reconstructed 
by  Lambertian  reflections. 

The  future  direction  of  our  studies  is  the  integration  of 
many  cues  in  the  light  of  active  vision  [2].  Active  vision 
involves  not  only  the  modeling  of  physical  sensing  and 
data  |)rocessing  for  vision  modules  (local  model),  but 
also  the  control  of  the  modules  (global  model).  Global 
models  characterize  the  overall  performance  and  make 
j)redictions  on  how  tiu*  individual  modules  will  interact, 
which  in  turn  determines  how  intermediate  results  are 
combined.  It  i.s  the  global  model  that  analyzes  and  com¬ 
bines  the  information  from  many  visual  cues  to  assign 
stable  descriptors.  For  more  stable  descriptions  of  re¬ 
flection  properties  in  more  general  environments,  it  is 
desirable  to  extract  extra  information  from  a  synergistic 
combination  of  multiple  cues.  The  spectral  tlifferencing 
algorithm  demonstrates  the  synergy  from  the  combina¬ 
tion  of  color  and  multiple  views.  There  are  also  poten¬ 
tials  for  extra  information  from  the  combination  of  color, 
polarization  and  !nultip|f‘  vi.'w<. 


Figure  4:  View  sampling 
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4  Surface  and  Volumetric  Segmentation 
of  Complex  3-D  Objects  Using 
Parametric  Shape  Models 

The  problem  of  part  definition,  description,  and  decom¬ 
position  is  central  to  shape  recognition  systems.  In  this 
paper  we  present  an  integrated  framework  for  segment¬ 
ing  dense  range  data  of  complex  3-D  scenes  into  their 
constituent  parts  in  terms  of  surface  (bi-quadrics)  and 
volumetric  (superquadrics)  primitives,  without  a  priori 
domain  knowledge  or  stored  models.  Our  objective  is 
to  recover  a  structured  description  of  complex  3-D  ob¬ 
jects,  guided  entirely  by  the  geometric  properties  of  the 
shape  models.  The  re.sulting  decomposition  into  parts 
is  very  useful  for  the  high-level  processes,  which  can  at¬ 
tach  domain  specific  labels  to  the  parts,  and  reason  at  a 
level  where  the  visual  input  is  structured  in  terms  of  ge¬ 
ometric  primitives,  rather  than  cope  with  the  difficulties 
of  low-level  vision  and  a  huge  amount  of  unstructured 
data. 

Since  the  shapes  have  to  be  recovered  from  raw  data, 
it  is  not  possible  to  invoke  complex  models  (models  with 
hundreds  of  degrees  of  freedom)  straight  away.  It  is, 
howev<M'.  feasible  and  perceptually  less  ambiguous  to  use 
simpler  but  powerful  models  that  can  capture  the  local 
and  global  pro|)erties  of  the  object  shapes,  and  provide  a 
first  approximation  to  the  more  complex  models.  With 
computability,  simplicity,  and  the  utility  of  the  shape 
representation  as  our  major  concerns,  we  use  bi-quadrics 
and  superquadrics  as  our  surface  and  volumetric  models 
respectively.  We  develop  SUPERSEG  (SUPERquadric 

5  KG  mentation),  a  control  structure  to  effectively  carry 
out  the  decomposition  of  complex  objects  in  range  im- 
ag<'.'^.  and  address  the  numerous  issues  encountered  in  a 
data-driven  bottom-up  approach  [13;  14;  15]. 

1‘he  SUPERSEG  system  5  has  five  major  components: 
namely,  the  bi-quadric  surface  segmentation  module:  the 
module  for  extracting  surface  properties  and  adjacency 
relationships;  the  superquadric  model  recovery  module; 
the  residual  generation  and  analysis  module;  and  the 
control  module  for  superquadric-based  segmentation. 

4.1  Surface  Segmentation:  Bi-quadric  Models 

The  surface  segmentation  is  performed  by  a  novel  local- 
to-global  iterative  regression  approach  of  searching  for 
the  best  piecewise  description  of  the  data  in  terms  of 
bi-quadric  models  [16;  17].  The  model-recovery  mod¬ 
ule  consists  of  independently  extrapolating  all  the  seed- 
regions  and  fitting  the  model  using  the  least-squares  re¬ 
gression  method.  The  region-growing  is  controlled  by 
a  compatibility- consirainiy  whose  value  depends  on  the 
noise  due  to  sensor  and  quantization,  as  well  as  the  al¬ 
lowed  tolerance  between  the  shapes  of  the  model  and  un¬ 
derlying  data.  Seed-regions  are  placed  in  a  grid-pattern 
all  over  the  image,  and  allowed  to  grow  until  they  are  ei¬ 
ther  completely  grown  or  rejected  by  the  model-selection 
procedure  (which  maximizes  a  linear  benefit-cost  func¬ 
tion).  Instead  of  first  growing  all  the  regions  and  then 
invoking  the  model-selection  procedure  (Re cover- then- 
select),  the  model-recovery  and  model-selection  pro¬ 
cesses  are  dynamically  combined  ( Recover- and-select)  to 
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Figure  5:  The  SUPERSEG  system: 
surface  and  volumetric  segmentation. 


A  framework  for 


achieve  a  computationally  feasible  and  robust  method 
capable  of  rejecting  outliers  and  determining  its  domain 
of  applicability. 

4.1.1  Refining  Surface  Segmentation  &: 

Extracting  Surface  properties 
The  bi-quadric  segmentation  achieved  by  the  above 
procedure  needs  refinement  before  it  can  be  used  as  an 
intermediate  segmentation  by  superquadric-based  vol¬ 
ume  segmentation.  Also,  the  coefficients  of  the  second- 
order  surfaces  have  information  about  orientation  and 
surface-type  (convex  or  concave)  inherent  in  them.  The 
orientation  information  is  tremendously  useful  in  align¬ 
ing  the  major  axis  of  cylindrical  superquadric  models. 
Further,  due  to  the  compatibility-constraint,  regions  in¬ 
tersecting  to  form  surface  normal  discontinuities  (Ci) 
overlap  in  the  vicinity  of  the  discontinuity,  thereby  local¬ 
izing  it.  We  developed  a  systematic  method  for  tracing 
the  biquadric  intersection  curve,  which  is  used  to  refiiu’ 
the  segmentation  as  well  as  to  localize  the  discontinuitif:'^ 
(edges)  and  to  characterize  them  as  convex  or  concave. 
In  addition,  a  surface  adjacency  graph  (SAG)  is  con¬ 
structed  with  surface  patches  as  nodes  and  discontinuitv- 
type  as  edges  between  them.  The  information  ex^^racted 
from  the  bi-quadric  patches  is  used  to  generate  and  te>t 
hypotheses  by  the  volumetric  segmentation  module. 

4.2  Superquadrics:  Volumetric  Part- Models 

Superquadric  models  are  convex  part-models  (except  the 
bent  models)  that  can  be  recovered  for  a  given  set  of 
3-D  points  by  minimizing  a  function  based  on  the  rnocii- 
fied  implicit  inside-outside  superquadric  function  [lb: 
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Figure  (>:  Tho  NIST  (>hj(H;t:  Top:  The  range  image  and  its  hi-quadric  surface  segmentation.  Center:  theT/i  (surface 
normal)  edges  marked  at  the  overlapping  |)arts  of  the  surfaces.  Following  a  procedure  similar  to  the  intersection 
cleaning,  the  edges  are  marked  as  convex  or  concave  and  a  surface  adjacency  graph  (SAG)  is  constructed.  Bottom: 
The  three  iterations  of  tin'  global-tolocal  procedure  to  extract  the  part-structure. 


15].  Tins  fonnuialion  enforces  a  minimum  volume  con* 
strain!  as  well  as  a  surface  constraint,  but  is  incapable 
of  decomposing  the  data  set  if  no  appropriate  convex 
model  can  be  found  in  the  model  vocabulary.  Thus,  the 
superquadric  model  recovery  module  is  adequate  only 
for  recovering  an  optimal  model  (if  oriented  correctly) 
given  a  data  set,  but  not  for  segmenting  it.  To  decide 
whether  a  recovered  model  is  adequate  for  the  given  data 
set,  we  have  developed  an  exhaustive  set  of  criteria  com¬ 
prised  of  qualitative  and  quantitative  measures.  Quan¬ 
titative  measures  are  the  nornicJized  global  deviation  of 
the  model  from  data.  The  deviation  can  be  the  inside- 
outside  function  value,  or  can  be  measured  along  the 
direction  of  the  viewpoint  (Z-residuals  for  a  range  scan¬ 
ner),  or  along  the  direction  of  the  minimum  distance  of 
a  point  from  the  model  (Euclidean  distance).  The  qual¬ 
itative  measures  are  the  ‘local'  residuals  characterized 
by  the  clusters  of  3-D  points  that  are  either  inside  the 
model,  or  on  the  model,  or  outside  the  model.  Both 
qualitative  and  quantitative  measures  are  necessary  for 
complete  evaluation  of  a  recovered  model. 

4.3  Volumetric  Segmentation:  The  Control 
Strategy 

In  view  of  the  fact  that  volumetric  models  don’t  have 
good  surface  support  (as  opposed  to  bi-quadric  models), 
they  cannot  be  recovered  by  following  exclusively  the  ex¬ 
trapolation  method  (local-to-global)  used  by  bi-quadrics. 
In  order  to  obtain  an  optimal  piecewise-convex  volumet¬ 
ric  segmentation,  it  is  necessary  to  proceed  global- to- 
local,  where  data  is  decomposed  only  if  the  global  model 
is  inadequate.  This  allows  controlled  residual-driven  de¬ 
composition  of  3-D  data,  as  also  introduction  of  an  ob¬ 
jective  evaluation  criteria  for  an  acceptable  description. 
However,  the  global-to-local  method  can  be  aided  by  the 
bi-quadric  segmentation  in  forming  hypotheses  about 
convex  combination  of  surfaces,  which  although  is  not 
true  in  general  (an  L  shape  for  example),  can  signifi¬ 
cantly  reduce  the  computational  overhead  if  true  for  a 
particular  part.  Previous  researchers  have  assumed  that 
a  1-to-l  mapping  exists  between  surface  patches  and  su- 
perquadric  models,  which  is  also  not  true  in  general.  But 
it  does  provide  a  planarity  check  for  the  patches,  as  well 
as  the  orientation  and  shape  of  the  individual  patches  in 
3-space. 

Thus,  a  strategy  that  combines  the  bi-quadric  infor¬ 
mation  with  the  global-to-local  residual-driven  method 
is  most  effective  in  recursively  segmenting  the  scene  to 
derive  the  part-structure  [13].  A  set  of  acceptance  crite¬ 
ria  based  on  the  quantitative  and  qualitative  measures 
provide  the  objective  evaluation  of  intermediate  descrip¬ 
tions,  and  decide  whether  to  terminate  the  procedure, 
or  selectively  refine  the  segmentation,  or  generate  nega¬ 
tive  volume  description.  The  control  module  generates 
hypotheses  about  superquadric  models  at  clusters  of  un¬ 
derestimated  data  and  performs  controlled  extrapolation 
of  part-models  by  shrinking  the  global  model.  The  re¬ 
cursive  splitting  of  data  results  in  a  hierarchical  part- 
structure  comprising  of  global  and  local  models.  The 
results  of  complete  processing  of  the  range  image  of  a 
machined  object  (from  NIST)  is  shown  in  Figure  6. 


WV  have  tested  the  SUPEILSECi  sys(ein  ou  real  raim. 
images  of  scenes  of  varying  complexity,  including  object  ^ 
with  occluding  parts,  and  scenes  where  surface  segmen¬ 
tation  is  not  sufficient  to  guide  the  volumetric  segment  a 
tion.  Some  of  the  applications  of  our  approach  includ* 
data  reduction,  3-D  object  recognition,  geometric  mod¬ 
eling,  automatic  model  generation,  object  manipulation, 
qualitative  vision,  and  active  vision. 

5  A  Framework  for  Visual  Observation 

In  this  work  we  establish  a  framework  for  the  general 
problem  of  observation,  which  may  be  aj)plied  to  dif¬ 
ferent  kinds  of  visual  tasks.  We  define  “intelligent 
high-level  control  mechanisms  for  the  observer  in  ordt  r 
to  achieve  efficiency  in  recognizing  different  processr^ 
within  a  specific  dynamic  system.  The  intelligent  ob¬ 
server  is  able  to  recognize  the  visual  tasks,  understands 
the  meaning  of  the  scene  evolution  and  successfully  n  - 
ports  on  the  current  visual  state.  It  is  obvious  that  therr 
is  a  need  for  high-level  interpretation  of  actions  within 
the  environment  and  to  have  guarantees  for  observation 
capabilities  and  stability  within  the  viewing  mechanism. 
The  framework  is  a  predictable  one  that  satisfies  the  fol¬ 
lowing  general  requirements: 

•  Recognizes  visual  tasks  and  events. 

•  Repositions  itself  adaptively  and  intelligently. 

•  Operates  in  real  time. 

•  Asserts  and  reports  on  distinct  and  discrete  visual 
states. 

•  Utilizes  the  continuous  parametric  evolution  of  tin* 
visual  system. 

•  Accommodates  visual  uncertainties. 

We  concentrate  on  observing  a  manipulation  process 
in  order  to  illustrate  the  ideas  and  motive  behind  our 
framework.  The  process  of  observing  a  robot  hand  ma¬ 
nipulating  an  object  is  very  crucial  for  many  robotic  and 
manufacturing  tasks.  It  is  important  to  know  in  an  au¬ 
tomated  manufacturing  environment  whether  the  robot 
hand  is  doing  the  correct  sequence  of  operations  on  an 
object  (or  more  than  one  object).  It  might  be  a  fact  that 
the  workspace  of  the  robotic  manipulator  cannot  be  ac¬ 
cessed  by  humans,  as  in  the  case  of  some  space  applica¬ 
tions  or  some  areas  within  a  nuclear  plant,  for  example. 
In  such  a  case,  having  another  robot  “look”  at  the  pro¬ 
cess  is  a  very  good  option.  Thus,  the  observation  process 
can  be  thought  of  as  a  stage  in  a  closed- loop  fully  auto¬ 
mated  system  where  there  are  robots  who  perform  the 
required  manipulation  task  and  some  other  robots  who 
observe  them  and  correct  their  actions  when  something 
goes  wrong.  Typical  manipulation  proce.sses  incbide 
grasping,  pushing,  pulling,  lifting,  squeezing,  screv  ng 
and  unscrewing.  In  this  work,  we  address  the  prob;em 
of  observing  a  single  hand  manipulating  a  single  obj^'ct 
and  recognizing  what  the  hand  is  doing.  No  feedback 
will  be  supplied  to  the  manipulating  robot  to  correct  its 
actions.  We  divide  the  problem  into  three  major  com¬ 
ponents.  First,  we  identify-  a  high-level  framework  for 
the  visual  states.  Next,  we  define  the  events  that  cause 
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Figure  8:  Propagation  of  Uncertainty 


Figure  7:  A  Model  for  a  Grasping  Task 

state  transitions.  Finally,  we  utilize  visual  uncertainties 
to  assert  the  state  of  the  system. 

5.1  State  Space  Modeling 

VVe  use  a  discrete  event  dynamic  system  as  a  high-level 
structuring  skeleton  to  model  the  visual  manipulation 
system.  Discrete  event  dynamic  systems  (DEDS)  are  dy¬ 
namic  systems  (typically  asynchronous)  in  which  state 
transitions  are  triggered  by  the  occurrence  of  discrete 
<‘veiits  in  the  system.  Our  formulation  uses  the  knowl- 
(xlge  about  the  system  and  the  different  actions  in  or¬ 
der  to  solve  the  observer  problem  in  an  efficient,  sta- 
i)le  and  practical  way.  The  model  incorporates  differ- 
eiit  hand/object  relationships  and  the  possible  errors  in 
the  manipulation  actions.  It  also  uses  different  tracking 
mechanisms  .so  that  the  observer  can  keep  track  of  the 
workspace  of  the  manipulating  robot.  A  framework  is 
developed  for  the  hand/object  interaction  over  time  and 
a  .stabilizing  observer  is  constructed.  The  construction 
proce.ss  utilizes  a  task-dependent  coarse  quantization  of 
the  manipulation  actions  in  order  to  attain  an  active, 
adaptive  and  goal-directed  sensing  mechanism.  An  ex¬ 
ample  of  a  DEDS  automaton  for  a  simple  grasping  task 
is  shown  in  Figure  7. 

5.2  Event  Identification 

Low-level  modules  are  developed  for  recognizing  the 
“events"  that  cause  state  transitions  within  the  dynamic 
manipulation  system.  To  be  able  to  observe  how  the 
events  evolve  over  time,  we  must  be  able  to  identify  how 
the  hand  muves  and  how  the  Iiand/object  physical  re- 
lation.ship  <'voivps  over  time.  We  use  a  mix  of  2-D  and 


3-D  modules  to  recover  a  set  of  parameters  that  define 
the  continuous  parametric  evolution  of  the  scene  under 
observation.  Three  dimensional  evolution  of  the  hand 
motion  is  recovered  by  tracking  a  set  of  features  and 
two-dimensional  cues  to  the  number  of  objects  and  their 
relative  location;  two  dimensional  motion  with  respect 
to  the  manipulating  hand  is  recovered  in  real-time.  The 
recovered  events  are  then  used  to  assert  state  transitions 
within  the  DEDS  automata.  We  also  recover  uncertain¬ 
ties  associated  with  the  visual  event  recovery  and  utilize 
them  for  navigating  the  observer  automata. 

5.3  Utilizing  Uncertainties 

This  work  examines  closely  the  possibilities  for  errors, 
mistakes  and  uncertainties  in  the  visual  manipulation 
system,  ob.server  construction  process  and  event  identifi¬ 
cation  mechanisms.  We  divide  the  problem  into  a  num¬ 
ber  of  major  levels  for  developing  uncertainty  models  in 
the  observation  process.  The  propagation  of  uncertainty 
is  shown  in  Figure  8. 

The  sensor  level  models  deal  with  the  problems  in 
mapping  3-D  features  to  pixel  coordinates  and  the  errors 
incurred  in  that  process.  We  identify  these  uncertainties 
and  suggest  a  framework  for  modeling  them.  The  next 
level  is  the  extraction  strategy  level,  in  which  we  develop 
models  for  the  possibility  of  errors  in  the  low-level  image 
processing  modules  used  for  identifying  features  that  are 
to  be  used  in  computing  the  2-D  evolution  of  the  scene 
under  consideration.  In  the  following  level,  we  utilize  the 
geometric  and  mechanical  properties  of  the  hand  and/or 
object  to  reject  unrealistic  estimates  for  2-D  movements 
that  might  have  been  obtained  from  the  first  two  lev¬ 
els.  We  transform  the  2-D  uncertainty  models  into  3- 
D  uncertainty  models  for  the  structure  and  motion  of 
the  entire  scene.  The  next  level  uses  the  (Mpiatioiis  that 
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Figure  9:  Experimental  Setting 


govern  the  2-D  to  3-D  relationship  to  perform  the  con¬ 
version.  We  then  reject  the  improbable  3-D  uncertainty 
models  for  motion  and  structure  estimates  by  using  the 
existing  infoimatioii  about  the  geometric  and  mechanical 
properties  of  the  moving  components  in  the  scene.  The 
highest  level  is  the  DEDS  formulation  with  uncertainties, 
in  which  state  transitions  and  event  identification  is  as¬ 
serted  according  to  the  3-D  models  of  uncertainty  that 
were  developed  in  the  previous  levels,  and  error  recovery 
is  performed  according  to  the  ordering  of  the  recovered 
distributions. 

5.4  Conclusions 

The  approach  used  can  be  considered  cis  a  framework  for 
a  variety  of  visual  tasks,  as  it  lends  itself  to  be  a  prac- 
Ucal  and  feasible  solution  that  uses  existing  information 
in  a  robust  and  modular  fashion.  The  work  examines 
closely  the  possibilities  for  errors  and  uncertainties  in 
the  manipulation  system,  observer  construction  process 
and  event  identification  mechanisms.  Ambiguities  are  al¬ 
lowed  to  develop  and  are  resolved  after  finite  time;  recov¬ 
ery  mechanisms  are  devised  too.  Details  of  the  observer 
system  can  be  found  in  [20;  21;  22;  23].  Theoretical  and 
experimental  aspects  of  the  w^ork  support  adopting  the 
frame w^ork  as  a  new  basis  for  performing  task-oriented 
recognition,  inspection  and  observation  of  visual  phe¬ 
nomena.  The  observer  and  manipulating  robots  experi¬ 
mental  setup  is  shown!  in  Figure  9. 

6  Spatio- Variant  Sensing 

Traditional  imaging  for  robotics  vision  has  relied  al¬ 
most  exclusively  on  common  commercial  imagers,  no¬ 
tably  television  format  sensors.  Their  advantages  are 
clear,  the  cameras  are  inexpensive  and  readily  available, 
and  the  sampling  of  the  data  is  on  a  ” natural”  carte¬ 
sian  (x,y)  grid.  These  sensors  have  placed  enormous  de¬ 
mands,  how^ever,  on  processing  architectures.  The  prob¬ 
lem  is  not  only  that  image  analysis  is  an  ill-defined  task 


in  thr  real  world,  but  that  we  liaw  only  Ncry  (‘xpcnisivc 
machines  that  can  begin  to  jiroce.ss  thr  data. 

Over  the  last  seven  years  an  iniernational  team,  led  by 
Van  dor  Spiegel  at  the  University  of  Pennsylvania.  San- 
dini  at  DIST  in  Italy,  and  Claoys  at  I.MEC’  in  Belgium, 
designed,  built,  and  tested  a  new  imaging  chip  called  the 
Retina  [24].  The  new  camera  serves  as  the  foundation 
to  a  new’  approach  to  robotics  vision.  We  sliift  the  focus 
at  the  .systems  level  from  gatliering  better  data  and  d('- 
signing  machines  to  analyze  it  to  gathering  data  for  the 
computing  resources  that  exist.  The  result  is  a  jirototyp*' 
sensor  that  reduces  the  computational  complexity  of  tlie 
problem  by  three  orders  of  magnitude  and.  if  scaled  to 
commercial  cameras,  by  six  orders  [25]. 

The  Retina  attempts  to  model  the  gross  characteri.s- 
tics  of  the  primate  visual  .system  in  a  mathematically 
elegant  w’ay.  The  computational  savings  arise  from  tlu' 
same  mechanism  the  eye  uses,  namely,  to  maintain  oik' 
area  of  high  resolution  on  the  focal  plane  and  to  droj) 
the  resolution  elsewhere.  The  mathematical  expre.ssion 
of  this  is  a  log-polar  mapping.  That  mapping  trans¬ 
forms  a  polar  data  space,  wdiere  a  point  has  tlie  polar 
coordinates  (rjheia),  by  taking  the  logarithm  of  the  ex¬ 
pression  for  the  point: 

r  =  rc'^  P''"  =  /„(,.)  +  iO  =  u  +  iv 

This  mapping  has  the  useful  property  of  .separating  ro¬ 
tations  (changes  in  t.lieia)  from  magnifications  (chang(\s 
in  r).  If  the  sensor  has  a  uniform  .sami)ling  grid  in  ?/  (In 
(r)),  then  the  spatial  grid  in  r  will  ex|)onentially  grow'  as 
distance  from  the  center  grows.  This  models  the  grow’th 
of  the  receptive  fields  in  primate  retinas. 

The  Retina  layout  in  Figure  10  implements  this  ma|)- 
ping  by  sampling  in  {rjheta)  at  points  matching  a  uni¬ 
form  (u,v)  grid.  The  sensor  clearly  has  rotational  sym¬ 
metry  and  exponentially  decreasing  resolution.  The  cir¬ 
cular  section  contains  only  1920  pixels  (30  circles  of  64 
pixels/circle);  at  the  center  is  a  dense  rectangular  grid  of 
102  additional  photosites  [26].  The  cells  grow’  fast:  the 
outermost  circle  is  over  ten  times  as  wude  as  the  inner¬ 
most.  This  leads  directly  to  the  small  pixel  count. 

The  chip,  with  its  custom  driving  electronics,  is  now’ 
working  at  the  GRASP  laboratory  [27]  and  is  producing 
good  pictures  as  showni  in  Figure  11. 

Clearly  visible  in  the  data  space  is  the  large  magni¬ 
fication  of  the  inner  circles.  The  outer  section  provides 
much  poorer  data,  with  pixels  widely  spaced  and  aver¬ 
aging  the  incident  light  over  a  larger  area.  Still  they  do 
not  provide  useless  information. 

The  nature  of  the  information  has  changed,  how’ever. 
No  longer  do  we  get  high  quality  data  across  the  foe  il 
plane.  Indeed,  we  assume  from  the  start  that  we  do  not 
try  to  build  a  model  of  the  world  in  one  step  Instead, 
we  use  the  periphery  to  guide  our  atteiuion — where  we 
point  the  camera.  Implicit  here  is  the  idea  of  an  activ  e 
observer.  The  Retina,  just  sitting  on  a  bench  waiting 
for  an  object  to  enter  its  high-resolution  spot,  is  useles.s. 
We  must  actively  build  the  world  by  moving  the  camera, 
using  the  periphery  to  suggest  candidates  for  attention. 

The  cost  of  using  this  sensor  might  be  considered  high. 
The  new’  data  space  will  require  rew’riting  or  adapting 


Figure  10:  The  Retina  CCD  Imager 


liguie  11.  lecture  of  a  mouse  from  tlie  camera,  centered  between  the  buttons  (to  the  left)  and  ball.  The  picture  on 
the  left  is  in  tlie  mapped  plane:  tlie  vertical  axis  is  v  (e,  the  angle  of  the  point,  increases  moving  down  the  axis)  and 
tlie  horizontal  is  u  (u,  tin'  log  of  the  radial  distance  of  tlie  point,  increases  to  the  right).  The  triangh'  at  the  upper 
lelt  of  th<'  image  i.s  data  laniiapped  back  onto  a  cartesian  grid. 


all  our  tools  for  the  cartesian  plane:  this  is  the  primary 
cost  outside  the  hardware  development.  The  advantages, 
liowever,  suggest  profit.  The  Retina  has  some  one  hun¬ 
dred  times  fewer  pixels  than  a  standard  television  cam¬ 
era,  which  drastically  reduces  the  computational  burden 
of  analysis,  bringing  it  within  the  abilities  of  modern 
machines.  The  gains  also  include  the  rich  mathemati¬ 
cal  structure  of  the  mapping.  That  structure  simplifies 
pattern  matching  by  making  rotations  and  magnifica¬ 
tions  linear  shifts  in  the  data  space,  and  speeds  time-to- 
impact  measurements  by  looking  only  at  a  radial  flow. 
Some  distortions  introduced  by  the  mapping,  such  as 
translational  variance  (which  is  linear  translations  be¬ 
coming  curves  in  the  data  space)  also  disappear  in  an 
active  observer,  where  for  example  attention  and  track¬ 
ing  automatically  compensate  for  linear  motion. 

Since  the  sensor  began  working  this  summer,  our  focus 
at  the  GRASP  laboratory  has  been  redeveloping  tradi¬ 
tional  image  processing  tools.  Our  work  has  looked  at 
edge  detection  in  the  new  data  space,  detecting  lines  us¬ 
ing  a  Hough  algorithm,  calculating  the  centroid  of  an  ob¬ 
ject,  and  measuring  time-to-impact.  Each  of  these  areas 
requires  an  analysis  of  their  mathematical  basis  under 
the  log  mapping  and  coding  the  results  on  real  images. 
All  algorithms  must  further  be  computationally  simple 
to  work  in  a  real-time  environment. 

This  integration  of  sensor  and  computer  is  now  the 
fundamental  area  of  research  involving  the  Retina  at 
Penn.  That  the  Retina  works  proves  the  concept  of 
the  hardware,  of  designing  custom  imaging  sensors  for 
robots.  The  integration  itself  will  prove  the  concept  of 
the  system.  The  Retina  is  the  basic  building  block  for  a 
real-time  interactive  observer. 

7  Conclusions  and  future  plans 

The  development  of  an  Active  Observer  is  underway 
at  the  GRASP  laboratory.  Although  future  emphasis 
will  be  placed  on  the  control  structure  of  such  an  ob¬ 
server,  its  integration  policies,  and  communication  issues 
with  other  observers  and  agents  in  general,  there  is  still 
a  need  for  further  studies,  developments  and  improve¬ 
ments  of  component  technologies.  For  example,  in  the 
case  of  understanding  surface  reflectance,  we  still  have 
not  completed  the  theoretical  underpinning  of  trans¬ 
parency.  With  the  problem  of  segmentation,  while  the 
cooperation  between  surface  and  volumetric  fittings  is 
necessary,  and  they  help  in  resolving  ambiguities,  the 
first  and  second  order  primitives  are  clearly  not  sufficient 
for  modeling  a  broad  class  of  real  life  objects.  Higher 
order  models  will  have  to  be  invoked,  but  only  selec¬ 
tively  and  locally  after  the  lower  order  fits  have  failed.  If 
this  order  of  fitting  data  is  violated  then  instabilities  in 
the  fitting  procedures  can  be  expected.  Finally,  there  is 
the  question  of  the  control  mechanism  of  the  Active  Ob¬ 
server.  As  shown  above,  we  have  employed  the  Discrete 
Event  Dynamic  System  model.  DEDS  is  a  suitable  for¬ 
malism  to  model  continuous  processes  of  observation,  as 
well  as  events  occuring  in  discrete  intervals.  As  a  result, 
this  model  allows  us  to  predict  the  observation  capabil¬ 
ity  as  defined  by  the  control  theory  community.  The 
assumption  here,  however,  is  that  the  task  of  observa¬ 


tion  is  a  prion  in  terms  of  the  discrete  events.  Wiiiie  in 
the  original  theory  the  transitions  from  one  state/ev('nl 
to  another  were  discrete,  we  have  extended  the  theory 
to  transitions  with  uncertainties.  The  next  task  should 
be  to  loosen  the  requirements  for  explicit  knowledge  of 
the  desired  observable  events.  These  events  should 
able  to  be  generated  from  some  rules  of  physics,  geom¬ 
etry  and  other  conventions  of  the  object’s  and  agent  s 
interactions.  In  conclusion,  we  are  on  our  way  to  com¬ 
plete  an  Active  Observer  which  has  a  control  structure 
that  allows  us  to  predict  observation  capabilities.  The 
components  developed  here  allow  the  Active  Observer  to 
handle  moderately  complex  scenes  of  shapes/materials, 
their  spatial  arrangements  and  their  illuminations.  The 
real  time  issue  of  processing  is  a  crucial  one  and  hence 
our  efforts  in  special  purpose  CCD  chips  and  related 
hardware.  The  open  questions  are  many  but  we  wish 
to  concentrate  on  the  intercommunication  of  several  ob¬ 
servers  and  other  agents,  such  as  manipulatory,  mobile 
and  human  agents.  Ultimately,  the  final  issue  is  this: 
who  tells  what  and  how  much,  and  to  whom. 
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